A note on security
-
Although pandoc itself will not create or modify any files other than those you explicitly ask it create (with the exception of temporary files used in producing PDFs), a filter or custom writer could in principle do anything on your file system. Please audit filters and custom writers very carefully before using them.
-
Several input formats (including LaTeX, Org, RST, and Typst) support
includedirectives that allow the contents of a file to be included in the output. An untrusted attacker could use these to view the contents of files on the file system. (Using the--sandboxoption can protect against this threat.) -
Several output formats (including RTF, FB2, HTML with
--self-contained, EPUB, Docx, and ODT) will embed encoded or raw images into the output file. An untrusted attacker could exploit this to view the contents of non-image files on the file system. (Using the--sandboxoption can protect against this threat, but will also prevent including images in these formats.) -
In reading HTML files, pandoc will attempt to include the contents of
iframeelements by fetching content from the local file or URL specified bysrc. If untrusted HTML is processed on a server, this has the potential to reveal anything readable by the process running the server. Using the-f html+raw_htmlwill mitigate this threat by causing the wholeiframeto be parsed as a raw HTML block. Using `–sandbox will also protect against the threat. -
If your application uses pandoc as a Haskell library (rather than shelling out to the executable), it is possible to use it in a mode that fully isolates pandoc from your file system, by running the pandoc operations in the
PandocPuremonad. See the document Using the pandoc API for more details. (This corresponds to the use of the--sandboxoption on the command line.) -
Pandoc’s parsers can exhibit pathological performance on some corner cases. It is wise to put any pandoc operations under a timeout, to avoid DOS attacks that exploit these issues. If you are using the pandoc executable, you can add the command line options
+RTS -M512M -RTS(for example) to limit the heap size to 512MB. Note that thecommonmarkparser (includingcommonmark_xandgfm) is much less vulnerable to pathological performance than themarkdownparser, so it is a better choice when processing untrusted input. -
The HTML generated by pandoc is not guaranteed to be safe. If
raw_htmlis enabled for the Markdown input, users can inject arbitrary HTML. Even ifraw_htmlis disabled, users can include dangerous content in URLs and attributes. To be safe, you should run all HTML generated from untrusted user input through an HTML sanitizer.